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A large deviation analysis of the solving complexity of random 3-Satisfiability instances slightly 
below threshold is presented. While finding a solution for such instances demands an exponential 
effort with high probability, we show that an exponentially small fraction of resolutions require a 
computation scaling linearly in the size of the instance only. This exponentially small probability 
of easy resolutions is analytically calculated, and the corresponding exponent shown to be smaller 
(in absolute value) than the growth exponent of the typical resolution time. Our study therefore 
gives some theoretical basis to heuristic stop-and-restart solving procedures, and suggests a natural 
cut-off (the size of the instance) for the restart. 

Computational problems are usually divided into two classes. They are easy if there exists a solving procedure whose 
running time grows at most polynomially with the size of the problem, or hard if no such algorithm is believed to exist, 
and the best avalaible procedures may require an exponentially growing time j^]. The polynomial vs. exponential 
classification was enriched in the last decades through the derivation of quantitative bounds on resolution complexity, 
and the study of average performances of various resolution algorithms for computational tasks with model input 
distributions. 

In this letter, we show that, though the polynomial/exponential dichotomy certainly applies to the typical reso- 
lution complexity of computational problem, it may not be so for large deviations from typical behavior. Typically 
exponentially hard problems may sometimes be solved in polynomial time, a phenomenon we take advantage of to 
accelerate resolution drastically. 

We concentrate here on random 3-Satisfiability (3-SAT), a paradigm of hard combinatorial problems recently 
studied using statistical physics tools and concepts [||J|] as e.g. number partitioning [Q, vertex cover ||^, ... to which 
computer scientists have devoted a great attention over the past years [^|-^. An instance of random 3-SAT is defined 
by a set of M constraints (clauses) on N boolean (i.e. True or False) variables. Each clause is the logical OR of three 
randomly chosen variables, or of their negations. The question is to decide whether there exists a logical assignment 
of the variables satisfying all the clauses (called solution). The best currently known algorithm to solve 3-SAT is the 
Davis-Putnam-Loveland-Logemann (DPLL) procedure H (Fig. 1). The sequence of assignments of variables made 
by DPLL in the course of instance solving can be represented as a search tree (Fig 2), whose size Q (number of 
nodes) is a convenient measure of the complexity. For very large sizes {M,N — > oo at fixed ratio a = M/N), some 
static and dynamical phase transitions arise p|,[7|-pO[. Instances with a ratio of clauses per variable a > ac — 4.3 
are almost surely (a.s.) unsatisfiable and obtaining proofs of refutation require an exponential effort P,pd|]. Below 
the static threshold ac^ instances are a.s. satisfiable, but finding a solution may be easy or hard, depending on the 
value of a. A dynamical transition |^,|2| takes place at ~ 3.003 (for the heuristic used by DPLL shown in Fig. 1) 
separating a polynomial regime (a < : Q ~ iV, search tree A on Fig. 2) from an exponential regime 

(a > : Q ~ 2^'^, search tree B). This pattern of complexity, and the value of Lu{a) were obtained through an 
analysis of DPLL dynamics, reminiscent of real-space renormalization in statistical physics j^. DPLL generates some 
dynamical flow of the instance, whose trajectory lies in the phase diagram of the 2+p-SAT model an extension of 
3-SAT, where p < 1 is the fraction of 3-clauses (Fig. 2). 

We focus hereafter on the large deviations of complexity in the upper sat phase ul < a < ac- Using numerical 
experiments and analytical calculations, we show that, though complexity Q a.s. grows as 2^", there is a finite, but 
exponentially small, probability 2"^'' that Q is bounded from above by N only. In other words, while finding solutions 
to these sat instances is almost always exponentially hard, it is very rarely easy (polynomial time). Taking advantage 
of the fact that C is smaller than w, we show how systematic restarts of the heuristic may decrease substantially the 
overall search cost. Our study therefore gives some theoretical basis to stop-and-restart solving procedures empirically 
known to be efficient [|l5|, and suggests a natural cut-off for the stop. 

Distributions of resolution times Q for a = 3.5 are reported on Fig. 3. The histogram of w = (logj Q)/N essentially 
exhibits a narrow peak (left side) followed by a wider bump (right side). As N grows, the right peak acquires more 
and more weight, while the left peak progressively disappears. The center of the right peak gets slightly shifted to 
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the left, but reaches a finite value uj* ~ 0.035 as A'' ^ oo [||. This right peaks thus corresponds to the core of 
exponentially hard resolutions: resolutions of instances a.s. require a time scaling as 2^" as the size N gets large, in 
agreement with the above discussion. 

On the contrary, the abscissa of the maximum of the left peak vanishes as logj N/N when the size N increases, 
indicating that the left peak accounts for polynomial (linear) resolutions. Its maximum is located at Q/N ~ 0.2 — 0.25, 
with weak dependence upon N . The cumulative probability Pun to have a complexity Q less than, or equal to N ^ 
decreases exponentially: Pun — 2~^'» with C — 0.011 ± 0.001 (Inset of Fig. 4). In the following we will concentrate on 
linear resolutions only (an analysis of the distribution of exponential resolutions for the problem of the vertex covering 
of random graphs can be found in p6[|). 

Further numerical investigations show that, in easy resolutions, the solution is essentially found at the end of the 
first branch, with a search tree of type A, and not i?, in Fig. 2. Easy resolution trajectories are able to cross the 
'dangerous' region extending beyond point D in Fig. 2, contrary to most trajectories which backtrack earlier. Beyond 
D, unit-clauses (UC) indeed accumulate. Their number Ci becomes of the order of N {Ci/N ~ 0.022 for a = 3.5), 
and the probability that the branch survives, i.e. that no two contradictory UC are present, is exponentially small in 
N, in agreement with the scaling of the left peak weight in Fig. 3. 

Calculation of C requires the analysis of the first descent in the search tree (Fig. 2). Each time DPLL assigns a 
variable some clauses are eliminated, other are reduced or left unchanged (Fig. 1). We thus characterize an instance 
by its state C = (Ci, C2, C3), where Cj is the number of j-clauses it includes (j — 1, 2, 3). Initially, C = (0, 0, ao^)- 
Let us call P(C; T) the probability that the assignment of T variables has produced no contradiction and an instance 
with state C. P obeys a Markovian evolution P{C;T + 1) = J2c' K{C,C';T)P{C';T) where the entries of the 
transition matrix K read 

^^J3— 22—0 i(?2— 

1 C 1 

^ $Z 2^^^^^ ^Z2-A2-U)3+l' '^Zl-Ai-tU2 + l-l' (1) 

Zl=0 

where 5c denotes the Kronecker function: = 1 if C = 0, otherwise. Variables appearing in ([^) are as follows. 
Aj = C'j — Cj, V = 6c"^, Zj (respectively Wj) is the number of j-clauses which are satisfied (resp. reduced to 
j — 1 clauses) when the (T + 1)*'' variable is assigned. These are stochastic variables drawn from several binomial 
distributions -Bp = (^)p^(l — p)^^^ . Parameter pj — j/{N — T) equals the probability that a j-clause contains 
the variable just assigned by DPLL. 

The introduction of the generating function P(y;T) ~ ^ P{^^T), allows us to express the evolution 

equation for the state probabilities in a compact manner, 

P(y ;r + 1) = e-s^(y) P(g(y) ;T) + (e-^^^^) - e-^-^^^) P{ - ^^(y), 53(7);^) (2) 

where g,(y) = +ln(l +7,(y)/iV), 7, (y) = 7,(2/,, %_i) = j (g-^^- (1 + 6^-0/2 - 1)/(1 - 1) for j = 1,2,3 (yo = ~^)- 
From (|l]),the Cjs undergo 0(1) changes each time a variable is fixed. After T — tN assignments, the densities 
Cj — Cj/N of clauses have been modified by 0(1). This translates into large N Ansatze for the state probability, 
P(C;T) — e^'^^^'*\ and for the generating function, P(y;T) = e^'^'^'''*\ up to non exponential in N terms, 
and (f are simply related to each other through a Legendre transform. In particular, ip(0] t) is the logarithm of the 
probability (divided by N) that the first branch has not been hit by any contradiction after a fraction t of variables 
have been assigned. The most probable values of the densities Cj (t) of j-clauses are equal to the partial derivatives of 
ip in y — 0. 

When DPLL starts running on a 3-SAT instance, clauses are reduced and some UC generated. Next they are 
eliminated through UC propagation, and splits occur frequently (Fig. 1). The number Ci of UC remains bounded 
with respect to the instance size TV, and the density ci{t) — dip/dyi identically vanishes, ip does not depend on yi, 
and ip{y2, 2/3 ; t) obeys the following partial differential equation (PDE) 

dip dip , , dip , , 

= -?/2 + 72U/2,y2;t + 73 2/2,y3;i) (3) 
at dyi dy-i 

We have solved analytically PDE (||) with initial condition ip(y\ 0) = aoj/s. The high probability scenario is obtained 
for 2/2 = 2/3 = 0: (^(0, 0; i) = indicates that the probability of survival of the branch is not exponentially small in 
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N iQ, and the partial derivatives C2(i),C3(f) give the typical densities of 2- and 3-clauses, in full agreement with 
Chao and Franco's result JT^ . We plot in Fig. 2 the corresponding resolution trajectories for various initial ratios 
Qfo, using the change of variables p — 03/(02 + 03), a — (c2 + C3)/(l — i). Our calculation provides furthermore a 
complete description of rare deviations of the resolution trajectory from its highly probable locus, giving acces to the 
exponentially small probabilities that p, a differ from their most probable values at 'time' t. 

The assumption Ci = 0{1) breaks down for the most probable trajectory at some fraction to e.g. tjj ~ 0.308 for 
ao = 3.5 at which the trajectory hits point D on Fig. 2. Beyond D, UC accumulate, and the probability of survival 
of the first branch becomes exponentially small in N. Variables are almost always assigned through unit-propagation: 
ci > 0. ip now depends on yi and, from (|^), obeys the following PDE 

We have solved PDE (^) through an expansion of ip in powers of y, whose coefficients obey, from a set of 
coupled linear ODEs. The initial conditions for the ODEs are chosen to match the expansion of the exact solution 
of (^), that is, the typical trajectory and its large deviations, at time tu. The quality of the approximation improves 
rapidly with the order k of the expansion, and no difference was found between fc = 3 and fc = 4 results, ci first 
increases, reaches its top value (ci)""*^, then decreases and vanishes at to' when the trajectory comes out from the 
"dangerous" region where contradictions a.s. occurs (Fig. 2). The probability of survival scales as 2~^'» for large N, 
with C = -ip{0;tD')/\-a2. The calculated values of C =i 0.01, (ci)™"^ ~ 0.022 and Q/N ~ 0.21 for a = 3.5 are in very 
good agreement with numerics. Fig. 4 shows the agreement between theory and simulations over the whole range 
aL < a < ac- 

The existence of rare but easy resolutions suggests the use of a systematic stop-and-restart (S&R) procedure to 
speed up resolution: if a solution is not found before N splits, DPLL is stopped and rerun after some random 
permutations of the variables and clauses. The expected number N^est of restarts necessary to find a solution being 
equal to the inverse probability l/Pun of linear resolutions, the resulting complexity should scale as JV2° °^^^ for 
a = 3.5, with an exponential gain with respect to DPLL one-run complexity, 2° "'^^^. Results of S&R experiments 
are reported on Fig 4. The typical number Nrest = 2^'- of restarts grows indeed exponentially with the size iV, with 
a rate C, = 0.0115 ± 0.001 equal to C fl^l- Performances are greatly enhanced by the use of S&R (see Fig. 4 for 
comparison between C, and lo). While with usual DPLL, we were able to solve instances with 500 variables in about 
one day of CPU for a = 3.5, instances with 1000 variables were solved with S&R in 15 minutes on the same computer. 

Our work therefore provides some theoretical support to the use of S&R ||l^,|l6| , and in addition suggests a natural 
cut-off at which the search is halted and restarted, the determination of which is usually widely empirical and problem 
dependent. If a combinatorial problem is efficiently solved (polynomial time) by a search heuristic for some values of 
the control parameter of the input distribution, there might be an exponentially small probability that the heuristic 
is still successfull (in polynomial time) in the range of parameters where resolution almost surely requires massive 
backtracking and exponential effort. When the decay rate of the polynomial time resolution probability C is smaller 
than the growth rate lo of the typical exponential resolution time, S&R with a cut-off in the search equal to a 
polynomial of the instance size will lead to an exponential speed up of resolutions. 
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(1) Choose a variable and its value (T,F) according to 
some heuristic rule (Split); 

(2) Analyze the implications of the choice on all the clauses : 

a: If all clauses are satisfied, then stop: a solution is found, 
b: If a contradiction appears, negate the last chosen 
variable and go to 2 (Backtracking), 

If all previously chosen variables have already been 

negated once, then stop: unsatisfiability is proven, 
c: if there is at least one clause with one variable, fix the 
variable to satisfy the clause and go to 2 (Unit Propagation), 
d: Else go to 1. 

FIG. 1. DPLL algorithm. When a variable has been chosen at step (1) e.g. a; = T, at step (2) some clauses are satisfied e.g. 
C = [x OR y OR z) and eliminated, other are reduced e.g. C = (not a; OR y OR z) C = {y OR z). If some clauses include 
one variable only e.g. C = y, the corresponding variable is automatically fixed to satisfy the clause (y = T). This unit clause 
(UC) propagation (2c) is repeated up to the exhaustion of all UC. Contradictions result from the presence of two opposite UC 
e.g. C = {y),C' = (not J/). A solution is found when no clauses are left. The heuristic studied here is the Generalized UC 
(GUC) rule: a variable is chosen at step (1) from one of the 2-clauses (or from a 3-clause if no 2-clause is present), and fixed 
to satisfy the clause. The search process of DPLL is represented by a tree (Fig. 2) whose nodes correspond to (1), and edges 
to (2). Branch extremities are marked with contradictions C (2b), or by a solution S (2a). 
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FIG. 2. Phase (iiagram of 2+p-SAT and first branch trajectories for satisfiable instances. The threshold hne ac{p) (bold 
full line) separates sat (lower part of the plane) from unsat (upper part) phases. Departure points for DPLL trajectories 
are located on the 3-SAT vertical axis (empty circles). Arrows indicate the direction of "motion" along trajectories (dashed 
curves) parameterized by the fraction t of variables set by DPLL. For small ratios a < ofL (~ 3.003 for the GUC heuristic), 
branch trajectories remain confined in the sat phase, end in S of coordinates (1,0), where a solution is found (with a search 
process reported on tree A). For ratios ul < a < ac, the branch trajectory intersects the threshold line at some point G. A 
contradiction a.s. arises before the trajectory crosses the dotted curve a — 1/(1 — p) (point D), and extensive backtracking up 
to G permits to find a solution (Search tree B). With exponentially small probability, the trajectory (dot-dashed curve, full 
arrow) is able to cross the "dangerous" region whore contradictions arc likely to occur (Search tree similar to A); it then exits 
from this region (point D') and ends up with a solution (lowest dashed trajectory). 
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FIG. 3. Histograms of the logarithm ui of the complexity Q (base 2, and divided by A'^) for a — 3.5 and different sizes 
N. Many instances are drawn randomly, and for each sample, DPLL is run until a solution is found (very few unsatisfiable 
instances can be present and are discarded). 
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FIG. 4. Log. of complexity using DPLL {uj - simulations: circles, theory from [3]: dotted line) and S&R (^ - simulations: 
squares, theory: full line) as a function of ratio a. Inset: Minus log. of the cumulative probability Pun of complexities Q < N 
as a function of the size for 100 < N < 400 (full line); log. of the number of restarts Nrest necessary to find a solution for 
100 < iV < 1000 (dotted hne) for a = 3.5. Slopes are C = 0.0011 and C = 0.00115 respectively. 
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